388 research outputs found
The Evolution of First Person Vision Methods: A Survey
The emergence of new wearable technologies such as action cameras and
smart-glasses has increased the interest of computer vision scientists in the
First Person perspective. Nowadays, this field is attracting attention and
investments of companies aiming to develop commercial devices with First Person
Vision recording capabilities. Due to this interest, an increasing demand of
methods to process these videos, possibly in real-time, is expected. Current
approaches present a particular combinations of different image features and
quantitative methods to accomplish specific objectives like object detection,
activity recognition, user machine interaction and so on. This paper summarizes
the evolution of the state of the art in First Person Vision video analysis
between 1997 and 2014, highlighting, among others, most commonly used features,
methods, challenges and opportunities within the field.Comment: First Person Vision, Egocentric Vision, Wearable Devices, Smart
Glasses, Computer Vision, Video Analytics, Human-machine Interactio
Left/Right Hand Segmentation in Egocentric Videos
Wearable cameras allow people to record their daily activities from a
user-centered (First Person Vision) perspective. Due to their favorable
location, wearable cameras frequently capture the hands of the user, and may
thus represent a promising user-machine interaction tool for different
applications. Existent First Person Vision methods handle hand segmentation as
a background-foreground problem, ignoring two important facts: i) hands are not
a single "skin-like" moving element, but a pair of interacting cooperative
entities, ii) close hand interactions may lead to hand-to-hand occlusions and,
as a consequence, create a single hand-like segment. These facts complicate a
proper understanding of hand movements and interactions. Our approach extends
traditional background-foreground strategies, by including a
hand-identification step (left-right) based on a Maxwell distribution of angle
and position. Hand-to-hand occlusions are addressed by exploiting temporal
superpixels. The experimental results show that, in addition to a reliable
left/right hand-segmentation, our approach considerably improves the
traditional background-foreground hand-segmentation
Average consensus-based asynchronous tracking
Target tracking in a network of wireless cameras may fail if data are captured or exchanged asynchronously. Unlike traditional sensor networks, video processing may generate significant delays that also vary from camera to camera. Moreover, the continuous and rapid change of the dynamics of the consensus variable (the target state) makes tracking even more challenging under these conditions. To address this problem, we propose a consensus approach that enables each camera to predict information of other cameras with respect to its own capturing time-stamp based on the received information. This prediction is key to compensate for asynchronous data exchanges. Simulations show the performance improvement with the proposed approach compared to the state of the art in the presence of asynchronous frame captures and random processing delays
Unsupervised Understanding of Location and Illumination Changes in Egocentric Videos
Wearable cameras stand out as one of the most promising devices for the
upcoming years, and as a consequence, the demand of computer algorithms to
automatically understand the videos recorded with them is increasing quickly.
An automatic understanding of these videos is not an easy task, and its mobile
nature implies important challenges to be faced, such as the changing light
conditions and the unrestricted locations recorded. This paper proposes an
unsupervised strategy based on global features and manifold learning to endow
wearable cameras with contextual information regarding the light conditions and
the location captured. Results show that non-linear manifold methods can
capture contextual patterns from global features without compromising large
computational resources. The proposed strategy is used, as an application case,
as a switching mechanism to improve the hand-detection problem in egocentric
videos.Comment: Submitted for publicatio
Advantages of dynamic analysis in HOG-PCA feature space for video moving object classification
Classification of moving objects for video surveillance applications still remains a challenging problem due to the video inherently changing conditions such as lighting or resolution. This paper proposes a new approach for vehicle/pedestrian object classification based on the learning of a static kNN classifier, a dynamic Hidden Markov Model (HMM)-based classifier, and the definition of a fusion rule that combines the two outputs. The main novelty consists in the study of the dynamic aspects of the moving objects by analysing the trajectories of the features followed in the HOG-PCA feature space, instead of the classical trajectory study based on the frame coordinates. The complete hybrid system was tested on the VIRAT database and worked in real time, yielding up to 100% peak accuracy rate in the tested video sequences
Use of Time-Frequency Analysis and Neural Networks for Mode Identification in a Wireless Software-Defined Radio Approach
The use of time-frequency distributions is proposed as a nonlinear signal processing technique that is combined with a pattern recognition approach to identify superimposed transmission modes in a reconfigurable wireless terminal based on software-defined radio techniques. In particular, a software-defined radio receiver is described aiming at the identification of two coexistent communication modes: frequency hopping code division multiple access and direct sequence code division multiple access. As a case study, two standards, based on the previous modes and operating in the same band (industrial, scientific, and medical), are considered: IEEE WLAN 802.11b (direct sequence) and Bluetooth (frequency hopping). Neural classifiers are used to obtain identification results. A comparison between two different neural classifiers is made in terms of relative error frequency
Advanced Video-Based Surveillance
Over the past decade, we have witnessed a tremendous
growth in the demand for personal security and defense of
vital infrastructure throughout the world. At the same time,
rapid advances in video-based surveillance have emerged
and offered a strategic technology to address the demands
imposed by security applications. These events have led to a
massive research effort devoted to the development of effective
and reliable surveillance systems endowed with intelligent
video-processing capabilities. As a result, advanced
video-based surveillance systems have been developed by
research groups from academia and industry alike. In broad
terms, advanced video-based surveillance could be described
as intelligent video processing designed to assist security
personnel by providing reliable real-time alerts and to
support efficient video analysis for forensics investigations
A bio-inspired logical process for saliency detections in cognitive crowd monitoring
It is well known from physiological studies that the level of human attention for adult individuals rapidly decreases after five to twenty minutes [1]. Attention retention for a surveillance operator represents a crucial aspect in Video Surveillance applications and could have a significant impact in identifying relevance, especially in crowded situations. In this field, advanced mechanisms for selection and extraction of saliency information can improve the performances of autonomous video surveillance systems and increase the effectiveness of human operator support. In particular, crowd monitoring represents a central aspect in many practical applications for managing and preventing emergencies due to panic and overcrowding
- âŠ